Skip to content

LOADTEST: Different approach to profile reconcilation (go code label handlers and no big SQL diff)#45573

Draft
MagnusHJensen wants to merge 5 commits into
mainfrom
claude/optimize-apple-reconciler-CDx6h
Draft

LOADTEST: Different approach to profile reconcilation (go code label handlers and no big SQL diff)#45573
MagnusHJensen wants to merge 5 commits into
mainfrom
claude/optimize-apple-reconciler-CDx6h

Conversation

@MagnusHJensen
Copy link
Copy Markdown
Member

No description provided.

claude added 3 commits May 15, 2026 08:48
… handlers

Optimizes the Apple profile reconciler path by scanning hosts in bounded
batches (default 5k per tick via a host_uuid cursor in Redis) and
computing desired state in Go using per-label-mode handlers instead of a
large MySQL UNION join.

Each tick:
  1. ListAppleMDMHostsForReconcileBatch pulls the next batch of
     Apple-enrolled host UUIDs by cursor (no profile-status check).
  2. ListAppleProfilesForReconcile loads the full profile catalog and
     label assignments once per tick.
  3. BulkGetHostLabelMemberships and BulkGetHostMDMAppleProfilesByUUIDs
     load the per-batch label memberships and current state.
  4. computeAppleReconcileDeltas dispatches each (host, profile) pair
     to one of four in-code handlers: no-labels, include-all,
     include-any, exclude-any. Broken-label and dynamic-label-timing
     semantics match the legacy SQL.
  5. The downstream CA-throttle, user-enrollment, host-being-set-up
     skip, BulkUpsertMDMAppleHostProfiles, and ProcessAndEnqueueProfiles
     flow is reused unchanged.

Gated by FLEET_MDM_APPLE_BATCHED_RECONCILER=true so the legacy path
stays the default. The cursor is persisted in Redis (mysqlredis wrapper)
and resets when a full pass completes, mirroring the Windows reconciler
pattern. Includes unit tests for each handler and the delta computation.

https://claude.ai/code/session_01Vvy1keXRKZRzDbJQd7dzDn
Drop the FLEET_MDM_APPLE_BATCHED_RECONCILER toggle so loadtests don't
need an env var to enable the batched path. The branch always runs
ReconcileAppleProfilesBatched; the legacy ReconcileAppleProfiles
function is still present for diffing/reference but no longer wired
into the cron.

https://claude.ai/code/session_01Vvy1keXRKZRzDbJQd7dzDn
Split AppleProfileForReconcile's single LabelMode + Labels into:
  - IncludeMode (None / All / Any) + IncludeLabels
  - ExcludeLabels (always "exclude any" semantic)

The dispatcher composes the two gates: a profile applies iff the
include gate passes (skipped when IncludeMode == None) AND the
exclude gate passes (skipped when ExcludeLabels is empty). The
existing per-gate handlers are unchanged in semantics; they now take
[]AppleProfileLabelRef directly so they're pure functions composable
in any combination.

Datastore loader partitions label rows by mcpl.exclude into the two
slices and derives IncludeMode from the include-row require_all flag.
Broken-label exemption from removal now considers labels in either
slice.

https://claude.ai/code/session_01Vvy1keXRKZRzDbJQd7dzDn
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 17.33150% with 601 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.57%. Comparing base (c19df6d) to head (174b4f3).
⚠️ Report is 43 commits behind head on main.

Files with missing lines Patch % Lines
server/service/apple_mdm_batched.go 22.80% 386 Missing ⚠️
server/datastore/mysql/apple_mdm_batched.go 0.00% 197 Missing ⚠️
server/datastore/mysqlredis/apple_recon_cursor.go 0.00% 17 Missing ⚠️
cmd/fleet/cron.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #45573      +/-   ##
==========================================
- Coverage   66.74%   66.57%   -0.18%     
==========================================
  Files        2740     2745       +5     
  Lines      219163   220004     +841     
  Branches    10947    10947              
==========================================
+ Hits       146283   146458     +175     
- Misses      59649    60320     +671     
+ Partials    13231    13226       -5     
Flag Coverage Δ
backend 68.38% <17.33%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Surface where a tick stalls and which commands the enqueue path fails
on so the cursor-stuck symptom can be traced to a specific step.

New log lines (look for cron=mdm_apple_profile_manager):
  - batched reconcile: listed hosts
  - batched reconcile: loaded profiles
  - batched reconcile: computed deltas (with to_install / to_remove)
  - batched reconcile: before bulk upsert
  - batched reconcile: enqueue complete (succeeded / failed cmd counts)
  - batched reconcile: failed command UUID (per failed cmd, with err)
  - batched reconcile: tick errored; cursor not advanced (with err)
  - batched reconcile: cursor advanced / tick complete, cursor unchanged
  - batched reconcile: ProcessAndEnqueueProfiles returned error

The cursor-advance deferred block was rewritten as a switch so the
outcome (errored / advanced / unchanged) is always logged with the
cursor values, making it easy to spot ticks that stall mid-pass.

https://claude.ai/code/session_01Vvy1keXRKZRzDbJQd7dzDn
ListAppleProfilesForReconcile used COALESCE(lbl.created_at,
'2000-01-01 00:00:00') with a string-literal default, which made
MySQL coerce the result column to VARCHAR. The MySQL driver returns
that as []uint8, and sql.NullTime.Scan can only accept time.Time or
nil — producing:

  sql: Scan error on column index 4, name "label_created_at":
  unsupported Scan, storing driver.Value type []uint8 into type
  *time.Time

This errored on every tick, so the reconciler never reached the
deferred cursor advance — the cursor stayed pinned at whatever
value a prior successful run had set it to, and no further work
got done.

Drop the COALESCE so the column stays TIMESTAMP. NULL → invalid
NullTime → zero time.Time, which the exclude-any handler already
treats as "no timing check" (matching the broken-label semantics).

https://claude.ai/code/session_01Vvy1keXRKZRzDbJQd7dzDn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants